Here we show the code to reproduce the analyses of: Risso and Pagnotta (2020). Per-sample standardization and asymmetric winsorization lead to accurate classification of RNA-seq expression profiles. In preparation.

This file belongs to the repository: https://github.com/drisso/awst_analysis.

The code is released with license GPL v3.0.

Install and load awst

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("drisso/awst")

Data import and cleaning

The collection of the SEQC datasets is available throught the seqc Bioconductor package. It can be installed with the following.

BiocManager::install("seqc")

We next build the data matrix from the “ILM_aceview” experiments. We remove duplicate gene symbols, ERCC spike-ins, and genes with no ENTREZ ID.

Distribution of samples per sites
AGR BGI CNL COH MAY NVS Sum
A 4 5 5 4 5 4 27
B 4 5 5 4 5 4 27
C 4 5 5 4 5 4 27
D 4 5 5 4 5 4 27
Sum 16 20 20 16 20 16 108

Figure 2 and Supplementary figure 1

Figure 2a and Supplementary figure 1a: clustering on RSEM data after awst

Figure 2b and Supplementary figure 1b: clustering on TPM data after Hart transformation

Figure 2c: clustering on CPM data (std values)

Figure 2d and Supplementary figure 1d: clustering on CPM data (std values; top 1000 genes)

Supplementary Figures 2 and 3

## [1] "A3_BGI" "D4_BGI" "D4_NVS" "A3_MAY" "C2_BGI" "A1_COH"

3 perturbed samples

Supplementary Figures 2a and 3a: clustering on CPM data (std values; top 1000 genes)

Supplementary Figures 2e and 3e: clustering on RSEM data after awst

Two-third perturbed samples

Supplementary Figures 2c and 3c: clustering on CPM data (std values; top 1000 genes)

Supplementary Figures 2g and 3g: clustering on RSEM data after awst

All perturbed samples

Supplementary Figures 2d and 3d: clustering on CPM data (std values; top 1000 genes)

Supplementary Figures 2h and 3h: clustering on RSEM data after awst

Figure 3

Radovich procedure

## [1] 15229   108
## [1] 1000  108
## end fraction
## clustered

## clustered

## clustered

## clustered

## Pearson's Chi-squared test
## X-stat = 108 (vCramer = 100%), df = 3, p-value = 2.95608e-23
## Pearson's Chi-squared test
## X-stat = 234.0248 (vCramer = 84.99%), df = 9, p-value = 2.33558e-45

Consensus with AWST

## end fraction
## clustered

## clustered

## clustered

## clustered

## Pearson's Chi-squared test
## X-stat = 108 (vCramer = 100%), df = 3, p-value = 2.95608e-23
## Pearson's Chi-squared test
## X-stat = 324 (vCramer = 100%), df = 9, p-value = 2.09594e-64

Session info

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ConsensusClusterPlus_1.50.0 awst_0.0.3                 
## [3] dendextend_1.12.0           cluster_2.1.0              
## [5] knitr_1.26                 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.3          highr_0.8           pillar_1.4.2       
##  [4] compiler_3.6.1      viridis_0.5.1       tools_3.6.1        
##  [7] digest_0.6.23       evaluate_0.14       lifecycle_0.1.0    
## [10] tibble_2.1.3        gtable_0.3.0        viridisLite_0.3.0  
## [13] pkgconfig_2.0.3     rlang_0.4.2         parallel_3.6.1     
## [16] yaml_2.2.0          xfun_0.11           gridExtra_2.3      
## [19] stringr_1.4.0       dplyr_0.8.3         grid_3.6.1         
## [22] tidyselect_0.2.5    Biobase_2.46.0      glue_1.3.1         
## [25] R6_2.4.1            rmarkdown_1.17      ggplot2_3.2.1      
## [28] purrr_0.3.3         magrittr_1.5        BiocGenerics_0.32.0
## [31] scales_1.1.0        htmltools_0.4.0     assertthat_0.2.1   
## [34] colorspace_1.4-1    stringi_1.4.3       lazyeval_0.2.2     
## [37] munsell_0.5.0       crayon_1.3.4